New High-Speed and Low-Power radix-2r multiplication algorithms
نویسندگان
چکیده
In this paper, a new recursive multibit recoding multiplication algorithm is introduced. It provides a general space-time partitioning of the multiplication problem that not only enables a drastic reduction of the number of partial products (N/r), but also eliminates the need of pre-computing odd multiples of the multiplicand in higher radix (r≥3) multiplication. Based on a mathematical proof that any higher radix-2 can be recursively derived from a combination of two or a number of lower radices, a series of generalized radix-2 multipliers are generated by means of primary radices: 2 , 2, 2, and 2. A variety of higher-radix (2-2) two’s complement 64x64 bit serial/parallel multipliers are implemented on Virtex-6 FPGA and characterized in terms of multiply-time, energy consumption per multiply-operation, and area occupation for r value varying from 2 to 64. Compared to a recent published algorithm, savings of 21%, 53%, 105% are respectively obtained in terms of speed, power, and area. Keywords—High-Radix Multiplication; Low-Power Multiplication; Multibit Recoding Multiplication; Partial Product Generator (PPG) I. BACKGROUND AND MOTIVATION The continuous refinement of the mostly-used design paradigm based on modified Booth algorithm [1] combined to a reduction tree (carry-save-adder array , Dadda,...) has reached saturation. In [2] only slight improvements are achieved. The proposal reduces the partial product number from N/2+1 to N/2 using different circuit optimization techniques of the critical path. Theoretically, only the signed multibit recoding multiplication algorithm [3] is capable of a drastic reduction (N/r) of the partial product number, given that r+1 is the number of bits of the multiplier that are simultaneously treated (1≤r≤N). Unfortunately, this algorithm requires the precomputation of a number of odd multiples of the multiplicand (until (2-1).X) that scales linearly with r. The large number of odd multiples not only requires a considerable amount of multiplexers to perform the necessary complex recoding into PPG, but dramatically increases the routing density as well. Therefore, a reverse effect occurs that offsets speed and power benefits of the compression factor (N/r). This is the main reason why the multibit recoding algorithm was abandoned. In practice, designs do not exceed r=3 (radix-8). The current trend [4][5] relies upon advanced arithmetic to determine minimal number bases that are representatives of the digits resulting from larger multibit recoding. The objective is to eliminate information redundancy inside r+1 bit-length slices for a more compact PPG. This is achievable as long as no or just very few odd multiples are required. In [4], Seidel et al. have introduced a secondary recoding of digits issued from an initial multibit recoding for 5≤r≤16. The recoding scheme is based on balanced complete residue system. Though it significantly reduces the number of partial products (N/r for 5≤r≤ 16), it requires some odd multiples for r≥8. While in [5], Dimitrov et al. have proposed a new recoding scheme based on double base number system for 6≤r≤11. The algorithm is limited to unsigned multiplication and requires a larger number of odd multiples. Instead of looking for more effective number bases, which is a hard mathematical task, our approach consists in exploiting already existing odd-multiple free recoding algorithms (2, 2, 2, and 2) to recursively build up generalized oddmultiple free radix-2 recoding schemes. To achieve such a goal, the multibit recoding multiplication algorithm is revisited [3]. Its design space is extended by the introduction of a new recursive version that enables a hardware-friendly space-time partitioning of the multiplication problem. Depending on r value ranging from 2 to N, highlyscalable signed multipliers with various levels of parallelism and latencies can be systematically generated with insignificant control-complexity. The new algorithm has also the merit to recursively reduce the number of partial products (N/r) without any limit for the parameter r and any need for the odd multiples of the multiplicand. It also allows the combination of different recoding schemes proposed in the literature into the same architecture for better performances of the multiplier. Several higher radix (2-2) two’s complement 64x64 bit serial/parallel multipliers based on combined recoding schemes are implemented on Virtex-6 FPGA and characterized in terms of speed, power, and area occupation for r value ranging from 2 to 64. Compared to a new signed version of Dimitrov et al. algorithm [5] and Seidel et al. algorithm [4], outstanding results are obtained with the new multibit recoding scheme for r=8 formed by the combination of Seidel algorithm (r=5), MacSorley algorithm (r=2) [1] and Booth algorithm (r=1) [6]. This work is supported by “Centre de Développement des Technologies Avancées” (CDTA), Algiers, Algeria, in collaboration with FEMTO-ST Institute, Besançon, France. The respective savings are as follows: 21%, 53%, 105% and 8%, 52%, 63% are obtained in terms of multiply-time, energy consumption per multiply-operation, and total gate count, respectively. The paper is organized as follows. Section I outlines the main requirement specifications for a generalized radix-2 multiplication. Section II introduces the new recursive multibit recoding multiplication algorithm. A number of high-radix (2-2) variants of the new algorithm accompanied with their implementation results are presented in Section III. II. THE NEW RECURSIVE MULTIBIT RECODING MULTIPLICATION ALGORITHM The equation (2.1.2) of the original multibit recoding algorithm presented in [3] does not offer hardware visibility. Let us rewrite it in a simpler hardware-friendly form, as follows: ( ∑ − = + + − ⋅ ⋅ ⋅ + + + + = 1 0 2 2 1 1 0 1 2 2 2 r N j rj rj rj rj y y y y Y
منابع مشابه
High speed Radix-4 Booth scheme in CNTFET technology for high performance parallel multipliers
A novel and robust scheme for radix-4 Booth scheme implemented in Carbon Nanotube Field-Effect Transistor (CNTFET) technology has been presented in this paper. The main advantage of the proposed scheme is its improved speed performance compared with previous designs. With the help of modifications applied to the encoder section using Pass Transistor Logic (PTL), the corresponding capacitances o...
متن کاملA New algorithm for multiple constant multiplications with low power consumption
Multiplications are costly operations in FIR filters. But for any given filter, the filter weights are constants. Several techniques have been developed over the years for the efficient realization of constant multiplications by a network of add/subtract-shift operations [6]. Constant multiplication methods are broadly of two types, (i) single constant multiplication (SCM) methods and (ii) mult...
متن کاملDesign and Simulation of Radix-8 Booth Encoder Multiplier for Signed and Unsigned Numbers
The multiplication operation is present in many parts of a digital system or digital computer, most notably in signal processing, graphics and scientific computation. With advances in technology, various techniques have been proposed to design multipliers, which offer high speed, low power consumption and lesser area. Thus making them suitable for various high speeds, low power compact VLSI imp...
متن کاملModified 32-Bit Shift-Add Multiplier Design for Low Power Application
Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. ...
متن کاملA New Low Power 32×32-bit Multiplier
Multipliers are one of the most important building blocks in processors. This paper describes a low-power 32×32-bit parallel multiplier, designed and fabricated using a 0.13 μm double-metal doublepoly CMOS process. In order to achieve low-power operation, the multiplier was designed utilizing mainly pass-transistor logic circuits, without significantly compromising the speed performance of the ...
متن کامل